102 research outputs found
Distributionally Robust Optimization for Sequential Decision Making
The distributionally robust Markov Decision Process (MDP) approach asks for a
distributionally robust policy that achieves the maximal expected total reward
under the most adversarial distribution of uncertain parameters. In this paper,
we study distributionally robust MDPs where ambiguity sets for the uncertain
parameters are of a format that can easily incorporate in its description the
uncertainty's generalized moment as well as statistical distance information.
In this way, we generalize existing works on distributionally robust MDP with
generalized-moment-based and statistical-distance-based ambiguity sets to
incorporate information from the former class such as moments and dispersions
to the latter class that critically depends on empirical observations of the
uncertain parameters. We show that, under this format of ambiguity sets, the
resulting distributionally robust MDP remains tractable under mild technical
conditions. To be more specific, a distributionally robust policy can be
constructed by solving a sequence of one-stage convex optimization subproblems
Model and Reinforcement Learning for Markov Games with Risk Preferences
We motivate and propose a new model for non-cooperative Markov game which
considers the interactions of risk-aware players. This model characterizes the
time-consistent dynamic "risk" from both stochastic state transitions (inherent
to the game) and randomized mixed strategies (due to all other players). An
appropriate risk-aware equilibrium concept is proposed and the existence of
such equilibria is demonstrated in stationary strategies by an application of
Kakutani's fixed point theorem. We further propose a simulation-based
Q-learning type algorithm for risk-aware equilibrium computation. This
algorithm works with a special form of minimax risk measures which can
naturally be written as saddle-point stochastic optimization problems, and
covers many widely investigated risk measures. Finally, the almost sure
convergence of this simulation-based algorithm to an equilibrium is
demonstrated under some mild conditions. Our numerical experiments on a two
player queuing game validate the properties of our model and algorithm, and
demonstrate their worth and applicability in real life competitive
decision-making.Comment: 38 pages, 6 tables, 5 figure
An Inexact Primal-Dual Smoothing Framework for Large-Scale Non-Bilinear Saddle Point Problems
We develop an inexact primal-dual first-order smoothing framework to solve a
class of non-bilinear saddle point problems with primal strong convexity.
Compared with existing methods, our framework yields a significant improvement
over the primal oracle complexity, while it has competitive dual oracle
complexity. In addition, we consider the situation where the primal-dual
coupling term has a large number of component functions. To efficiently handle
this situation, we develop a randomized version of our smoothing framework,
which allows the primal and dual sub-problems in each iteration to be solved by
randomized algorithms inexactly in expectation. The convergence of this
framework is analyzed both in expectation and with high probability. In terms
of the primal and dual oracle complexities, this framework significantly
improves over its deterministic counterpart. As an important application, we
adapt both frameworks for solving convex optimization problems with many
functional constraints. To obtain an -optimal and
-feasible solution, both frameworks achieve the best-known oracle
complexities (in terms of their dependence on )
- …